Evaluation of naturalnEss of synthEsizEd spEEch with diffErEnt prosodic modEls
ثبت نشده
چکیده
Obtaining natural synthesized speech is the main goal of modern research in the field of speech synthesis. It strongly depends on the prosody model used in the text-to-speech (TTS) system. This paper deals with speech synthesis evaluation with respect to the prosodic model used. Our Russian VitalVoice TTS is a unit selection concatenative system. We describe two approaches to prosody prediction used in VitalVoice Russian TTS. These are a rule-based approach and a hidden Markov model (HMM) based hybrid approach. We conduct an experiment for evaluating the naturalness of synthesized speech. Four variants of synthesized speech depending on the applied approach and the speech corpus size were tested. We also included natural speech samples into the test. Subjects had to rate the samples from 0 to 5 depending on their naturalness. The experiment shows that speech synthesized using the hybrid HMM-based approach sounds more natural than other synthetic variants. We discuss the results and the ways for further investigation and improvements in the last section.
منابع مشابه
Improvement of prosodic characteristic in Vietnamese speech synthesis system base on HMM
The key factors helping people to understand the synthesized voices of text-to-speech system are the naturalness and the intelligibility. However, making more natural voices remains a difficult task because of the speech data’s scarcity. With data limited corpus, prosodic information such as tone, intonation, Part-of-Speech is added to ensure the quality of synthetic speech. In the paper, we in...
متن کاملProsody Analysis of L2 English for Naturalness Evaluation Through Speech Modification
This study investigates how different prosodic features affect native speakers' naturalness judgement of L2 English speech by Chinese students. Through subjective judgment by native speakers and objectively measured prosodic features, timing and pitch related prosodic features, as well as segmental goodness of pronunciation have been found to play key roles in native speakers' perception of nat...
متن کاملNaturalness Judgement of Prosodic Variation of Japanese Utterances with Prosody Modified Stimuli
This study aims to identify the crucial prosodic factor for native speakers’ naturalness judgement of L2 pronunciation. Prosodic features are known to have more impact on the naturalness of L2 learners’ pronunciation than segmental features do. Among prosodic features, timing and pitch are looked at in this study as major prosodic factors which affect native speakers’ naturalness judgement of L...
متن کاملComparative evaluation of synthetic prosody with the PURR method
In order to evaluate the prosodic output of a speech synthesis system independently from its segmental quality, we have developed a special way to delexicalize speech stimuli which we call PURR (Prosody Unveiling through Restricted Representation). We compared the use of PURR stimuli for the evaluation of prosodic naturalness in three different test designs: magnitude estimation (ME), categoric...
متن کاملProsodic Analysis and Modelling for Malay Emotional Speech Synthesis
This paper discusses an emotional prosody generator for a Malay speech synthesis system that can re-synthesize the selected vocal emotion from neutral synthesized speech output and improve the naturalness by adopting rulebased prosody conversion techniques. The role of prosodic features in emotional expression, particularly fundamental frequency and duration, has been widely investigated in sev...
متن کامل